Búsqueda | Portal Regional de la BVS

1.

Heterogeneous associations between interleukin-6 receptor variants and phenotypes across ancestries and implications for therapy.

Wang, Xuan; Liu, Molei; Nogues, Isabelle-Emmanuella; Chen, Tony; Xiong, Xin; Bonzel, Clara-Lea; Zhang, Harrison; Hong, Chuan; Xia, Yin; Dahal, Kumar; Costa, Lauren; Cui, Jing; Gaziano, J Michael; Kim, Seoyoung C; Ho, Yuk-Lam; Cho, Kelly; Cai, Tianxi; Liao, Katherine P.

Sci Rep ; 14(1): 8021, 2024 04 05.

Artículo en Inglés | MEDLINE | ID: mdl-38580710

RESUMEN

The Phenome-Wide Association Study (PheWAS) is increasingly used to broadly screen for potential treatment effects, e.g., IL6R variant as a proxy for IL6R antagonists. This approach offers an opportunity to address the limited power in clinical trials to study differential treatment effects across patient subgroups. However, limited methods exist to efficiently test for differences across subgroups in the thousands of multiple comparisons generated as part of a PheWAS. In this study, we developed an approach that maximizes the power to test for heterogeneous genotype-phenotype associations and applied this approach to an IL6R PheWAS among individuals of African (AFR) and European (EUR) ancestries. We identified 29 traits with differences in IL6R variant-phenotype associations, including a lower risk of type 2 diabetes in AFR (OR 0.96) vs EUR (OR 1.0, p-value for heterogeneity = 8.5 × 10-3), and higher white blood cell count (p-value for heterogeneity = 8.5 × 10-131). These data suggest a more salutary effect of IL6R blockade for T2D among individuals of AFR vs EUR ancestry and provide data to inform ongoing clinical trials targeting IL6 for an expanding number of conditions. Moreover, the method to test for heterogeneity of associations can be applied broadly to other large-scale genotype-phenotype screens in diverse populations.

Asunto(s)

Diabetes Mellitus Tipo 2 , Humanos , Diabetes Mellitus Tipo 2/tratamiento farmacológico , Diabetes Mellitus Tipo 2/genética , Estudios de Asociación Genética , Fenotipo , Polimorfismo de Nucleótido Simple , Receptores de Interleucina-6/genética

2.

Trans-Balance: Reducing demographic disparity for prediction models in the presence of class imbalance.

Hong, Chuan; Liu, Molei; Wojdyla, Daniel M; Hickey, Jimmy; Pencina, Michael; Henao, Ricardo.

J Biomed Inform ; 149: 104532, 2024 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-38070817

RESUMEN

INTRODUCTION: Risk prediction, including early disease detection, prevention, and intervention, is essential to precision medicine. However, systematic bias in risk estimation caused by heterogeneity across different demographic groups can lead to inappropriate or misinformed treatment decisions. In addition, low incidence (class-imbalance) outcomes negatively impact the classification performance of many standard learning algorithms which further exacerbates the racial disparity issues. Therefore, it is crucial to improve the performance of statistical and machine learning models in underrepresented populations in the presence of heavy class imbalance. METHOD: To address demographic disparity in the presence of class imbalance, we develop a novel framework, Trans-Balance, by leveraging recent advances in imbalance learning, transfer learning, and federated learning. We consider a practical setting where data from multiple sites are stored locally under privacy constraints. RESULTS: We show that the proposed Trans-Balance framework improves upon existing approaches by explicitly accounting for heterogeneity across demographic subgroups and cohorts. We demonstrate the feasibility and validity of our methods through numerical experiments and a real application to a multi-cohort study with data from participants of four large, NIH-funded cohorts for stroke risk prediction. CONCLUSION: Our findings indicate that the Trans-Balance approach significantly improves predictive performance, especially in scenarios marked by severe class imbalance and demographic disparity. Given its versatility and effectiveness, Trans-Balance offers a valuable contribution to enhancing risk prediction in biomedical research and related fields.

Asunto(s)

Algoritmos , Investigación Biomédica , Humanos , Estudios de Cohortes , Aprendizaje Automático , Demografía

3.

Knowledge-Driven Online Multimodal Automated Phenotyping System.

Xiong, Xin; Sweet, Sara Morini; Liu, Molei; Hong, Chuan; Bonzel, Clara-Lea; Panickan, Vidul Ayakulangara; Zhou, Doudou; Wang, Linshanshan; Costa, Lauren; Ho, Yuk-Lam; Geva, Alon; Mandl, Kenneth D; Cheng, Suchun; Xia, Zongqi; Cho, Kelly; Gaziano, J Michael; Liao, Katherine P; Cai, Tianxi; Cai, Tianrun.

medRxiv ; 2023 Oct 02.

Artículo en Inglés | MEDLINE | ID: mdl-37873131

RESUMEN

Though electronic health record (EHR) systems are a rich repository of clinical information with large potential, the use of EHR-based phenotyping algorithms is often hindered by inaccurate diagnostic records, the presence of many irrelevant features, and the requirement for a human-labeled training set. In this paper, we describe a knowledge-driven online multimodal automated phenotyping (KOMAP) system that i) generates a list of informative features by an online narrative and codified feature search engine (ONCE) and ii) enables the training of a multimodal phenotyping algorithm based on summary data. Powered by composite knowledge from multiple EHR sources, online article corpora, and a large language model, features selected by ONCE show high concordance with the state-of-the-art AI models (GPT4 and ChatGPT) and encourage large-scale phenotyping by providing a smaller but highly relevant feature set. Validation of the KOMAP system across four healthcare centers suggests that it can generate efficient phenotyping algorithms with robust performance. Compared to other methods requiring patient-level inputs and gold-standard labels, the fully online KOMAP provides a significant opportunity to enable multi-center collaboration.

4.

Diversity and Scale: Genetic Architecture of 2,068 Traits in the VA Million Veteran Program.

Verma, Anurag; Huffman, Jennifer E; Rodriguez, Alex; Conery, Mitchell; Liu, Molei; Ho, Yuk-Lam; Kim, Youngdae; Heise, David A; Guare, Lindsay; Panickan, Vidul Ayakulangara; Garcon, Helene; Linares, Franciel; Costa, Lauren; Goethert, Ian; Tipton, Ryan; Honerlaw, Jacqueline; Davies, Laura; Whitbourne, Stacey; Cohen, Jeremy; Posner, Daniel C; Sangar, Rahul; Murray, Michael; Wang, Xuan; Dochtermann, Daniel R; Devineni, Poornima; Shi, Yunling; Nandi, Tarak Nath; Assimes, Themistocles L; Brunette, Charles A; Carroll, Robert J; Clifford, Royce; Duvall, Scott; Gelernter, Joel; Hung, Adriana; Iyengar, Sudha K; Joseph, Jacob; Kember, Rachel; Kranzler, Henry; Levey, Daniel; Luoh, Shiuh-Wen; Merritt, Victoria C; Overstreet, Cassie; Deak, Joseph D; Grant, Struan F A; Polimanti, Renato; Roussos, Panos; Sun, Yan V; Venkatesh, Sanan; Voloudakis, Georgios; Justice, Amy.

medRxiv ; 2023 Jun 29.

Artículo en Inglés | MEDLINE | ID: mdl-37425708

RESUMEN

Genome-wide association studies (GWAS) have underrepresented individuals from non-European populations, impeding progress in characterizing the genetic architecture and consequences of health and disease traits. To address this, we present a population-stratified phenome-wide GWAS followed by a multi-population meta-analysis for 2,068 traits derived from electronic health records of 635,969 participants in the Million Veteran Program (MVP), a longitudinal cohort study of diverse U.S. Veterans genetically similar to the respective African (121,177), Admixed American (59,048), East Asian (6,702), and European (449,042) superpopulations defined by the 1000 Genomes Project. We identified 38,270 independent variants associating with one or more traits at experiment-wide P<4.6×10-11 significance; fine-mapping 6,318 signals identified from 613 traits to single-variant resolution. Among these, a third (2,069) of the associations were found only among participants genetically similar to non-European reference populations, demonstrating the importance of expanding diversity in genetic studies. Our work provides a comprehensive atlas of phenome-wide genetic associations for future studies dissecting the architecture of complex traits in diverse populations.

5.

Assessing the Most Vulnerable Subgroup to Type II Diabetes Associated with Statin Usage: Evidence from Electronic Health Record Data.

Guo, Xinzhou; Wei, Waverly; Liu, Molei; Cai, Tianxi; Wu, Chong; Wang, Jingshen.

J Am Stat Assoc ; 118(543): 1488-1499, 2023.

Artículo en Inglés | MEDLINE | ID: mdl-38223220

RESUMEN

There have been increased concerns that the use of statins, one of the most commonly prescribed drugs for treating coronary artery disease, is potentially associated with the increased risk of new-onset Type II diabetes (T2D). Nevertheless, to date, there is no robust evidence supporting as to whether and what kind of populations are indeed vulnerable for developing T2D after taking statins. In this case study, leveraging the biobank and electronic health record data in the Partner Health System, we introduce a new data analysis pipeline and a novel statistical methodology that address existing limitations by (i) designing a rigorous causal framework that systematically examines the causal effects of statin usage on T2D risk in observational data, (ii) uncovering which patient subgroup is most vulnerable for developing T2D after taking statins, and (iii) assessing the replicability and statistical significance of the most vulnerable subgroup via a bootstrap calibration procedure. Our proposed approach delivers asymptotically sharp confidence intervals and debiased estimate for the treatment effect of the most vulnerable subgroup in the presence of high-dimensional covariates. With our proposed approach, we find that females with high T2D genetic risk are at the highest risk of developing T2D due to statin usage.

6.

Efficient Evaluation of Prediction Rules in Semi-Supervised Settings under Stratified Sampling.

Gronsbell, Jessica; Liu, Molei; Tian, Lu; Cai, Tianxi.

J R Stat Soc Series B Stat Methodol ; 84(4): 1353-1391, 2022 Sep.

Artículo en Inglés | MEDLINE | ID: mdl-36275859

RESUMEN

In many contemporary applications, large amounts of unlabeled data are readily available while labeled examples are limited. There has been substantial interest in semi-supervised learning (SSL) which aims to leverage unlabeled data to improve estimation or prediction. However, current SSL literature focuses primarily on settings where labeled data is selected uniformly at random from the population of interest. Stratified sampling, while posing additional analytical challenges, is highly applicable to many real world problems. Moreover, no SSL methods currently exist for estimating the prediction performance of a fitted model when the labeled data is not selected uniformly at random. In this paper, we propose a two-step SSL procedure for evaluating a prediction rule derived from a working binary regression model based on the Brier score and overall misclassification rate under stratified sampling. In step I, we impute the missing labels via weighted regression with nonlinear basis functions to account for stratified sampling and to improve efficiency. In step II, we augment the initial imputations to ensure the consistency of the resulting estimators regardless of the specification of the prediction model or the imputation model. The final estimator is then obtained with the augmented imputations. We provide asymptotic theory and numerical studies illustrating that our proposals outperform their supervised counterparts in terms of efficiency gain. Our methods are motivated by electronic health record (EHR) research and validated with a real data analysis of an EHR-based study of diabetic neuropathy.

7.

Weakly Semi-supervised phenotyping using Electronic Health records.

Nogues, Isabelle-Emmanuella; Wen, Jun; Lin, Yucong; Liu, Molei; Tedeschi, Sara K; Geva, Alon; Cai, Tianxi; Hong, Chuan.

J Biomed Inform ; 134: 104175, 2022 10.

Artículo en Inglés | MEDLINE | ID: mdl-36064111

RESUMEN

OBJECTIVE: Electronic Health Record (EHR) based phenotyping is a crucial yet challenging problem in the biomedical field. Though clinicians typically determine patient-level diagnoses via manual chart review, the sheer volume and heterogeneity of EHR data renders such tasks challenging, time-consuming, and prohibitively expensive, thus leading to a scarcity of clinical annotations in EHRs. Weakly supervised learning algorithms have been successfully applied to various EHR phenotyping problems, due to their ability to leverage information from large quantities of unlabeled samples to better inform predictions based on a far smaller number of patients. However, most weakly supervised methods are subject to the challenge to choose the right cutoff value to generate an optimal classifier. Furthermore, since they only utilize the most informative features (i.e., main ICD and NLP counts) they may fail for episodic phenotypes that cannot be consistently detected via ICD and NLP data. In this paper, we propose a label-efficient, weakly semi-supervised deep learning algorithm for EHR phenotyping (WSS-DL), which overcomes the limitations above. MATERIALS AND METHODS: WSS-DL classifies patient-level disease status through a series of learning stages: 1) generating silver standard labels, 2) deriving enhanced-silver-standard labels by fitting a weakly supervised deep learning model to data with silver standard labels as outcomes and high dimensional EHR features as input, and 3) obtaining the final prediction score and classifier by fitting a supervised learning model to data with a minimal number of gold standard labels as the outcome, and the enhanced-silver-standard labels and a minimal set of most informative EHR features as input. To assess the generalizability of WSS-DL across different phenotypes and medical institutions, we apply WSS-DL to classify a total of 17 diseases, including both acute and chronic conditions, using EHR data from three healthcare systems. Additionally, we determine the minimum quantity of training labels required by WSS-DL to outperform existing supervised and semi-supervised phenotyping methods. RESULTS: The proposed method, in combining the strengths of deep learning and weakly semi-supervised learning, successfully leverages the crucial phenotyping information contained in EHR features from unlabeled samples. Indeed, the deep learning model's ability to handle high-dimensional EHR features allows it to generate strong phenotype status predictions from silver standard labels. These predictions, in turn, provide highly effective features in the final logistic regression stage, leading to high phenotyping accuracy in notably small subsets of labeled data (e.g. n = 40 labeled samples). CONCLUSION: Our method's high performance in EHR datasets with very small numbers of labels indicates its potential value in aiding doctors to diagnose rare diseases as well as conditions susceptible to misdiagnosis.

Asunto(s)

Registros Electrónicos de Salud , Aprendizaje Automático Supervisado , Algoritmos , Modelos Logísticos , Fenotipo

8.

International comparisons of laboratory values from the 4CE collaborative to predict COVID-19 mortality.

Weber, Griffin M; Hong, Chuan; Xia, Zongqi; Palmer, Nathan P; Avillach, Paul; L'Yi, Sehi; Keller, Mark S; Murphy, Shawn N; Gutiérrez-Sacristán, Alba; Bonzel, Clara-Lea; Serret-Larmande, Arnaud; Neuraz, Antoine; Omenn, Gilbert S; Visweswaran, Shyam; Klann, Jeffrey G; South, Andrew M; Loh, Ne Hooi Will; Cannataro, Mario; Beaulieu-Jones, Brett K; Bellazzi, Riccardo; Agapito, Giuseppe; Alessiani, Mario; Aronow, Bruce J; Bell, Douglas S; Benoit, Vincent; Bourgeois, Florence T; Chiovato, Luca; Cho, Kelly; Dagliati, Arianna; DuVall, Scott L; Barrio, Noelia García; Hanauer, David A; Ho, Yuk-Lam; Holmes, John H; Issitt, Richard W; Liu, Molei; Luo, Yuan; Lynch, Kristine E; Maidlow, Sarah E; Malovini, Alberto; Mandl, Kenneth D; Mao, Chengsheng; Matheny, Michael E; Moore, Jason H; Morris, Jeffrey S; Morris, Michele; Mowery, Danielle L; Ngiam, Kee Yuan; Patel, Lav P; Pedrera-Jimenez, Miguel.

NPJ Digit Med ; 5(1): 74, 2022 Jun 13.

Artículo en Inglés | MEDLINE | ID: mdl-35697747

RESUMEN

Given the growing number of prediction algorithms developed to predict COVID-19 mortality, we evaluated the transportability of a mortality prediction algorithm using a multi-national network of healthcare systems. We predicted COVID-19 mortality using baseline commonly measured laboratory values and standard demographic and clinical covariates across healthcare systems, countries, and continents. Specifically, we trained a Cox regression model with nine measured laboratory test values, standard demographics at admission, and comorbidity burden pre-admission. These models were compared at site, country, and continent level. Of the 39,969 hospitalized patients with COVID-19 (68.6% male), 5717 (14.3%) died. In the Cox model, age, albumin, AST, creatine, CRP, and white blood cell count are most predictive of mortality. The baseline covariates are more predictive of mortality during the early days of COVID-19 hospitalization. Models trained at healthcare systems with larger cohort size largely retain good transportability performance when porting to different sites. The combination of routine laboratory test values at admission along with basic demographic features can predict mortality in patients hospitalized with COVID-19. Importantly, this potentially deployable model differs from prior work by demonstrating not only consistent performance but also reliable transportability across healthcare systems in the US and Europe, highlighting the generalizability of this model and the overall approach.

9.

Changes in laboratory value improvement and mortality rates over the course of the pandemic: an international retrospective cohort study of hospitalised patients infected with SARS-CoV-2.

Hong, Chuan; Zhang, Harrison G; L'Yi, Sehi; Weber, Griffin; Avillach, Paul; Tan, Bryce W Q; Gutiérrez-Sacristán, Alba; Bonzel, Clara-Lea; Palmer, Nathan P; Malovini, Alberto; Tibollo, Valentina; Luo, Yuan; Hutch, Meghan R; Liu, Molei; Bourgeois, Florence; Bellazzi, Riccardo; Chiovato, Luca; Sanz Vidorreta, Fernando J; Le, Trang T; Wang, Xuan; Yuan, William; Neuraz, Antoine; Benoit, Vincent; Moal, Bertrand; Morris, Michele; Hanauer, David A; Maidlow, Sarah; Wagholikar, Kavishwar; Murphy, Shawn; Estiri, Hossein; Makoudjou, Adeline; Tippmann, Patric; Klann, Jeffery; Follett, Robert W; Gehlenborg, Nils; Omenn, Gilbert S; Xia, Zongqi; Dagliati, Arianna; Visweswaran, Shyam; Patel, Lav P; Mowery, Danielle L; Schriver, Emily R; Samayamuthu, Malarkodi Jebathilagam; Kavuluru, Ramakanth; Lozano-Zahonero, Sara; Zöller, Daniela; Tan, Amelia L M; Tan, Byorn W L; Ngiam, Kee Yuan; Holmes, John H.

BMJ Open ; 12(6): e057725, 2022 06 23.

Artículo en Inglés | MEDLINE | ID: mdl-35738646

RESUMEN

OBJECTIVE: To assess changes in international mortality rates and laboratory recovery rates during hospitalisation for patients hospitalised with SARS-CoV-2 between the first wave (1 March to 30 June 2020) and the second wave (1 July 2020 to 31 January 2021) of the COVID-19 pandemic. DESIGN, SETTING AND PARTICIPANTS: This is a retrospective cohort study of 83 178 hospitalised patients admitted between 7 days before or 14 days after PCR-confirmed SARS-CoV-2 infection within the Consortium for Clinical Characterization of COVID-19 by Electronic Health Record, an international multihealthcare system collaborative of 288 hospitals in the USA and Europe. The laboratory recovery rates and mortality rates over time were compared between the two waves of the pandemic. PRIMARY AND SECONDARY OUTCOME MEASURES: The primary outcome was all-cause mortality rate within 28 days after hospitalisation stratified by predicted low, medium and high mortality risk at baseline. The secondary outcome was the average rate of change in laboratory values during the first week of hospitalisation. RESULTS: Baseline Charlson Comorbidity Index and laboratory values at admission were not significantly different between the first and second waves. The improvement in laboratory values over time was faster in the second wave compared with the first. The average C reactive protein rate of change was -4.72 mg/dL vs -4.14 mg/dL per day (p=0.05). The mortality rates within each risk category significantly decreased over time, with the most substantial decrease in the high-risk group (42.3% in March-April 2020 vs 30.8% in November 2020 to January 2021, p<0.001) and a moderate decrease in the intermediate-risk group (21.5% in March-April 2020 vs 14.3% in November 2020 to January 2021, p<0.001). CONCLUSIONS: Admission profiles of patients hospitalised with SARS-CoV-2 infection did not differ greatly between the first and second waves of the pandemic, but there were notable differences in laboratory improvement rates during hospitalisation. Mortality risks among patients with similar risk profiles decreased over the course of the pandemic. The improvement in laboratory values and mortality risk was consistent across multiple countries.

Asunto(s)

COVID-19 , Pandemias , Hospitalización , Humanos , Estudios Retrospectivos , SARS-CoV-2

10.

Fast and powerful conditional randomization testing via distillation.

Liu, Molei; Katsevich, Eugene; Janson, Lucas; Ramdas, Aaditya.

Biometrika ; 109(2): 277-293, 2022 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-37416628

RESUMEN

We consider the problem of conditional independence testing: given a response Y and covariates (X,Z), we test the null hypothesis that Yâ««Xâ£Z. The conditional randomization test was recently proposed as a way to use distributional information about Xâ£Z to exactly and nonasymptotically control Type-I error using any test statistic in any dimensionality without assuming anything about Yâ£(X,Z). This flexibility, in principle, allows one to derive powerful test statistics from complex prediction algorithms while maintaining statistical validity. Yet the direct use of such advanced test statistics in the conditional randomization test is prohibitively computationally expensive, especially with multiple testing, due to the requirement to recompute the test statistic many times on resampled data. We propose the distilled conditional randomization test, a novel approach to using state-of-the-art machine learning algorithms in the conditional randomization test while drastically reducing the number of times those algorithms need to be run, thereby taking advantage of their power and the conditional randomization test's statistical guarantees without suffering the usual computational expense. In addition to distillation, we propose a number of other tricks, like screening and recycling computations, to further speed up the conditional randomization test without sacrificing its high power and exact validity. Indeed, we show in simulations that all our proposals combined lead to a test that has similar power to the most powerful existing conditional randomization test implementations, but requires orders of magnitude less computation, making it a practical tool even for large datasets. We demonstrate these benefits on a breast cancer dataset by identifying biomarkers related to cancer stage.

11.

Prior Adaptive Semi-supervised Learning with Application to EHR Phenotyping.

Zhang, Yichi; Liu, Molei; Neykov, Matey; Cai, Tianxi.

J Mach Learn Res ; 232022.

Artículo en Inglés | MEDLINE | ID: mdl-37974910

RESUMEN

Electronic Health Record (EHR) data, a rich source for biomedical research, have been successfully used to gain novel insight into a wide range of diseases. Despite its potential, EHR is currently underutilized for discovery research due to its major limitation in the lack of precise phenotype information. To overcome such difficulties, recent efforts have been devoted to developing supervised algorithms to accurately predict phenotypes based on relatively small training datasets with gold standard labels extracted via chart review. However, supervised methods typically require a sizable training set to yield generalizable algorithms, especially when the number of candidate features, p, is large. In this paper, we propose a semi-supervised (SS) EHR phenotyping method that borrows information from both a small, labeled dataset (where both the label Y and the feature set X are observed) and a much larger, weakly-labeled dataset in which the feature set X is accompanied only by a surrogate label S that is available to all patients. Under a working prior assumption that S is related to X only through Y and allowing it to hold approximately, we propose a prior adaptive semi-supervised (PASS) estimator that incorporates the prior knowledge by shrinking the estimator towards a direction derived under the prior. We derive asymptotic theory for the proposed estimator and justify its efficiency and robustness to prior information of poor quality. We also demonstrate its superiority over existing estimators under various scenarios via simulation studies and on three real-world EHR phenotyping studies at a large tertiary hospital.

12.

Individual Data Protected Integrative Regression Analysis of High-Dimensional Heterogeneous Data.

Cai, Tianxi; Liu, Molei; Xia, Yin.

J Am Stat Assoc ; 117(540): 2105-2119, 2022.

Artículo en Inglés | MEDLINE | ID: mdl-37975021

RESUMEN

Evidence-based decision making often relies on meta-analyzing multiple studies, which enables more precise estimation and investigation of generalizability. Integrative analysis of multiple heterogeneous studies is, however, highly challenging in the ultra high-dimensional setting. The challenge is even more pronounced when the individual-level data cannot be shared across studies, known as DataSHIELD contraint. Under sparse regression models that are assumed to be similar yet not identical across studies, we propose in this paper a novel integrative estimation procedure for data-Shielding High-dimensional Integrative Regression (SHIR). SHIR protects individual data through summary-statistics-based integrating procedure, accommodates between-study heterogeneity in both the covariate distribution and model parameters, and attains consistent variable selection. Theoretically, SHIR is statistically more efficient than the existing distributed approaches that integrate debiased LASSO estimators from the local sites. Furthermore, the estimation error incurred by aggregating derived data is negligible compared to the statistical minimax rate and SHIR is shown to be asymptotically equivalent in estimation to the ideal estimator obtained by sharing all data. The finite-sample performance of our method is studied and compared with existing approaches via extensive simulation settings. We further illustrate the utility of SHIR to derive phenotyping algorithms for coronary artery disease using electronic health records data from multiple chronic disease cohorts.

13.

Authorship Correction: International Changes in COVID-19 Clinical Trajectories Across 315 Hospitals and 6 Countries: Retrospective Cohort Study.

Weber, Griffin M; Zhang, Harrison G; L'Yi, Sehi; Bonzel, Clara-Lea; Hong, Chuan; Avillach, Paul; Gutiérrez-Sacristán, Alba; Palmer, Nathan P; Tan, Amelia Li Min; Wang, Xuan; Yuan, William; Gehlenborg, Nils; Alloni, Anna; Amendola, Danilo F; Bellasi, Antonio; Bellazzi, Riccardo; Beraghi, Michele; Bucalo, Mauro; Chiovato, Luca; Cho, Kelly; Dagliati, Arianna; Estiri, Hossein; Follett, Robert W; García Barrio, Noelia; Hanauer, David A; Henderson, Darren W; Ho, Yuk-Lam; Holmes, John H; Hutch, Meghan R; Kavuluru, Ramakanth; Kirchoff, Katie; Klann, Jeffrey G; Krishnamurthy, Ashok K; Le, Trang T; Liu, Molei; Loh, Ne Hooi Will; Lozano-Zahonero, Sara; Luo, Yuan; Maidlow, Sarah; Makoudjou, Adeline; Malovini, Alberto; Martins, Marcelo Roberto; Moal, Bertrand; Morris, Michele; Mowery, Danielle L; Murphy, Shawn N; Neuraz, Antoine; Ngiam, Kee Yuan; Okoshi, Marina P; Omenn, Gilbert S.

J Med Internet Res ; 23(11): e34625, 2021 Nov 30.

Artículo en Inglés | MEDLINE | ID: mdl-34889759

RESUMEN

[This corrects the article DOI: 10.2196/31400.].

14.

Clinical knowledge extraction via sparse embedding regression (KESER) with multi-center large scale electronic health record data.

Hong, Chuan; Rush, Everett; Liu, Molei; Zhou, Doudou; Sun, Jiehuan; Sonabend, Aaron; Castro, Victor M; Schubert, Petra; Panickan, Vidul A; Cai, Tianrun; Costa, Lauren; He, Zeling; Link, Nicholas; Hauser, Ronald; Gaziano, J Michael; Murphy, Shawn N; Ostrouchov, George; Ho, Yuk-Lam; Begoli, Edmon; Lu, Junwei; Cho, Kelly; Liao, Katherine P; Cai, Tianxi.

NPJ Digit Med ; 4(1): 151, 2021 Oct 27.

Artículo en Inglés | MEDLINE | ID: mdl-34707226

RESUMEN

The increasing availability of electronic health record (EHR) systems has created enormous potential for translational research. However, it is difficult to know all the relevant codes related to a phenotype due to the large number of codes available. Traditional data mining approaches often require the use of patient-level data, which hinders the ability to share data across institutions. In this project, we demonstrate that multi-center large-scale code embeddings can be used to efficiently identify relevant features related to a disease of interest. We constructed large-scale code embeddings for a wide range of codified concepts from EHRs from two large medical centers. We developed knowledge extraction via sparse embedding regression (KESER) for feature selection and integrative network analysis. We evaluated the quality of the code embeddings and assessed the performance of KESER in feature selection for eight diseases. Besides, we developed an integrated clinical knowledge map combining embedding data from both institutions. The features selected by KESER were comprehensive compared to lists of codified data generated by domain experts. Features identified via KESER resulted in comparable performance to those built upon features selected manually or with patient-level data. The knowledge map created using an integrative analysis identified disease-disease and disease-drug pairs more accurately compared to those identified using single institution data. Analysis of code embeddings via KESER can effectively reveal clinical knowledge and infer relatedness among codified concepts. KESER bypasses the need for patient-level data in individual analyses providing a significant advance in enabling multi-center studies using EHR data.

15.

International Changes in COVID-19 Clinical Trajectories Across 315 Hospitals and 6 Countries: Retrospective Cohort Study.

Weber, Griffin M; Zhang, Harrison G; L'Yi, Sehi; Bonzel, Clara-Lea; Hong, Chuan; Avillach, Paul; Gutiérrez-Sacristán, Alba; Palmer, Nathan P; Tan, Amelia Li Min; Wang, Xuan; Yuan, William; Gehlenborg, Nils; Alloni, Anna; Amendola, Danilo F; Bellasi, Antonio; Bellazzi, Riccardo; Beraghi, Michele; Bucalo, Mauro; Chiovato, Luca; Cho, Kelly; Dagliati, Arianna; Estiri, Hossein; Follett, Robert W; García Barrio, Noelia; Hanauer, David A; Henderson, Darren W; Ho, Yuk-Lam; Holmes, John H; Hutch, Meghan R; Kavuluru, Ramakanth; Kirchoff, Katie; Klann, Jeffrey G; Krishnamurthy, Ashok K; Le, Trang T; Liu, Molei; Loh, Ne Hooi Will; Lozano-Zahonero, Sara; Luo, Yuan; Maidlow, Sarah; Makoudjou, Adeline; Malovini, Alberto; Martins, Marcelo Roberto; Moal, Bertrand; Morris, Michele; Mowery, Danielle L; Murphy, Shawn N; Neuraz, Antoine; Ngiam, Kee Yuan; Okoshi, Marina P; Omenn, Gilbert S.

J Med Internet Res ; 23(10): e31400, 2021 10 11.

Artículo en Inglés | MEDLINE | ID: mdl-34533459

RESUMEN

BACKGROUND: Many countries have experienced 2 predominant waves of COVID-19-related hospitalizations. Comparing the clinical trajectories of patients hospitalized in separate waves of the pandemic enables further understanding of the evolving epidemiology, pathophysiology, and health care dynamics of the COVID-19 pandemic. OBJECTIVE: In this retrospective cohort study, we analyzed electronic health record (EHR) data from patients with SARS-CoV-2 infections hospitalized in participating health care systems representing 315 hospitals across 6 countries. We compared hospitalization rates, severe COVID-19 risk, and mean laboratory values between patients hospitalized during the first and second waves of the pandemic. METHODS: Using a federated approach, each participating health care system extracted patient-level clinical data on their first and second wave cohorts and submitted aggregated data to the central site. Data quality control steps were adopted at the central site to correct for implausible values and harmonize units. Statistical analyses were performed by computing individual health care system effect sizes and synthesizing these using random effect meta-analyses to account for heterogeneity. We focused the laboratory analysis on C-reactive protein (CRP), ferritin, fibrinogen, procalcitonin, D-dimer, and creatinine based on their reported associations with severe COVID-19. RESULTS: Data were available for 79,613 patients, of which 32,467 were hospitalized in the first wave and 47,146 in the second wave. The prevalence of male patients and patients aged 50 to 69 years decreased significantly between the first and second waves. Patients hospitalized in the second wave had a 9.9% reduction in the risk of severe COVID-19 compared to patients hospitalized in the first wave (95% CI 8.5%-11.3%). Demographic subgroup analyses indicated that patients aged 26 to 49 years and 50 to 69 years; male and female patients; and black patients had significantly lower risk for severe disease in the second wave than in the first wave. At admission, the mean values of CRP were significantly lower in the second wave than in the first wave. On the seventh hospital day, the mean values of CRP, ferritin, fibrinogen, and procalcitonin were significantly lower in the second wave than in the first wave. In general, countries exhibited variable changes in laboratory testing rates from the first to the second wave. At admission, there was a significantly higher testing rate for D-dimer in France, Germany, and Spain. CONCLUSIONS: Patients hospitalized in the second wave were at significantly lower risk for severe COVID-19. This corresponded to mean laboratory values in the second wave that were more likely to be in typical physiological ranges on the seventh hospital day compared to the first wave. Our federated approach demonstrated the feasibility and power of harmonizing heterogeneous EHR data from multiple international health care systems to rapidly conduct large-scale studies to characterize how COVID-19 clinical trajectories evolve.

Asunto(s)

COVID-19 , Pandemias , Adulto , Anciano , Femenino , Hospitalización , Hospitales , Humanos , Masculino , Persona de Mediana Edad , Estudios Retrospectivos , SARS-CoV-2

16.

National Trends in Disease Activity for COVID-19 Among Children in the US.

Hutch, Meghan R; Liu, Molei; Avillach, Paul; Luo, Yuan; Bourgeois, Florence T.

Front Pediatr ; 9: 700656, 2021.

Artículo en Inglés | MEDLINE | ID: mdl-34307261

RESUMEN

Ongoing monitoring of COVID-19 disease burden in children will help inform mitigation strategies and guide pediatric vaccination programs. Leveraging a national, comprehensive dataset, we sought to quantify and compare disease burden and trends in hospitalizations for children and adults in the US.

17.

International Analysis of Electronic Health Records of Children and Youth Hospitalized With COVID-19 Infection in 6 Countries.

Bourgeois, Florence T; Gutiérrez-Sacristán, Alba; Keller, Mark S; Liu, Molei; Hong, Chuan; Bonzel, Clara-Lea; Tan, Amelia L M; Aronow, Bruce J; Boeker, Martin; Booth, John; Cruz Rojo, Jaime; Devkota, Batsal; García Barrio, Noelia; Gehlenborg, Nils; Geva, Alon; Hanauer, David A; Hutch, Meghan R; Issitt, Richard W; Klann, Jeffrey G; Luo, Yuan; Mandl, Kenneth D; Mao, Chengsheng; Moal, Bertrand; Moshal, Karyn L; Murphy, Shawn N; Neuraz, Antoine; Ngiam, Kee Yuan; Omenn, Gilbert S; Patel, Lav P; Jiménez, Miguel Pedrera; Sebire, Neil J; Balazote, Pablo Serrano; Serret-Larmande, Arnaud; South, Andrew M; Spiridou, Anastasia; Taylor, Deanne M; Tippmann, Patric; Visweswaran, Shyam; Weber, Griffin M; Kohane, Isaac S; Cai, Tianxi; Avillach, Paul.

JAMA Netw Open ; 4(6): e2112596, 2021 06 01.

Artículo en Inglés | MEDLINE | ID: mdl-34115127

RESUMEN

Importance: Additional sources of pediatric epidemiological and clinical data are needed to efficiently study COVID-19 in children and youth and inform infection prevention and clinical treatment of pediatric patients. Objective: To describe international hospitalization trends and key epidemiological and clinical features of children and youth with COVID-19. Design, Setting, and Participants: This retrospective cohort study included pediatric patients hospitalized between February 2 and October 10, 2020. Patient-level electronic health record (EHR) data were collected across 27 hospitals in France, Germany, Spain, Singapore, the UK, and the US. Patients younger than 21 years who tested positive for COVID-19 and were hospitalized at an institution participating in the Consortium for Clinical Characterization of COVID-19 by EHR were included in the study. Main Outcomes and Measures: Patient characteristics, clinical features, and medication use. Results: There were 347 males (52%; 95% CI, 48.5-55.3) and 324 females (48%; 95% CI, 44.4-51.3) in this study's cohort. There was a bimodal age distribution, with the greatest proportion of patients in the 0- to 2-year (199 patients [30%]) and 12- to 17-year (170 patients [25%]) age range. Trends in hospitalizations for 671 children and youth found discrete surges with variable timing across 6 countries. Data from this cohort mirrored national-level pediatric hospitalization trends for most countries with available data, with peaks in hospitalizations during the initial spring surge occurring within 23 days in the national-level and 4CE data. A total of 27â¯364 laboratory values for 16 laboratory tests were analyzed, with mean values indicating elevations in markers of inflammation (C-reactive protein, 83 mg/L; 95% CI, 53-112 mg/L; ferritin, 417 ng/mL; 95% CI, 228-607 ng/mL; and procalcitonin, 1.45 ng/mL; 95% CI, 0.13-2.77 ng/mL). Abnormalities in coagulation were also evident (D-dimer, 0.78 ug/mL; 95% CI, 0.35-1.21 ug/mL; and fibrinogen, 477 mg/dL; 95% CI, 385-569 mg/dL). Cardiac troponin, when checked (n = 59), was elevated (0.032 ng/mL; 95% CI, 0.000-0.080 ng/mL). Common complications included cardiac arrhythmias (15.0%; 95% CI, 8.1%-21.7%), viral pneumonia (13.3%; 95% CI, 6.5%-20.1%), and respiratory failure (10.5%; 95% CI, 5.8%-15.3%). Few children were treated with COVID-19-directed medications. Conclusions and Relevance: This study of EHRs of children and youth hospitalized for COVID-19 in 6 countries demonstrated variability in hospitalization trends across countries and identified common complications and laboratory abnormalities in children and youth with COVID-19 infection. Large-scale informatics-based approaches to integrate and analyze data across health care systems complement methods of disease surveillance and advance understanding of epidemiological and clinical features associated with COVID-19 in children and youth.

Asunto(s)

COVID-19/epidemiología , Registros Electrónicos de Salud/estadística & datos numéricos , Hospitalización/estadística & datos numéricos , Pandemias , SARS-CoV-2 , Adolescente , Niño , Preescolar , Femenino , Salud Global , Humanos , Lactante , Recién Nacido , Masculino , Estudios Retrospectivos

18.

International Comparisons of Harmonized Laboratory Value Trajectories to Predict Severe COVID-19: Leveraging the 4CE Collaborative Across 342 Hospitals and 6 Countries: A Retrospective Cohort Study.

Weber, Griffin M; Hong, Chuan; Palmer, Nathan P; Avillach, Paul; Murphy, Shawn N; Gutiérrez-Sacristán, Alba; Xia, Zongqi; Serret-Larmande, Arnaud; Neuraz, Antoine; Omenn, Gilbert S; Visweswaran, Shyam; Klann, Jeffrey G; South, Andrew M; Loh, Ne Hooi Will; Cannataro, Mario; Beaulieu-Jones, Brett K; Bellazzi, Riccardo; Agapito, Giuseppe; Alessiani, Mario; Aronow, Bruce J; Bell, Douglas S; Bellasi, Antonio; Benoit, Vincent; Beraghi, Michele; Boeker, Martin; Booth, John; Bosari, Silvano; Bourgeois, Florence T; Brown, Nicholas W; Bucalo, Mauro; Chiovato, Luca; Chiudinelli, Lorenzo; Dagliati, Arianna; Devkota, Batsal; DuVall, Scott L; Follett, Robert W; Ganslandt, Thomas; García Barrio, Noelia; Gradinger, Tobias; Griffier, Romain; Hanauer, David A; Holmes, John H; Horki, Petar; Huling, Kenneth M; Issitt, Richard W; Jouhet, Vianney; Keller, Mark S; Kraska, Detlef; Liu, Molei; Luo, Yuan.

medRxiv ; 2021 Feb 05.

Artículo en Inglés | MEDLINE | ID: mdl-33564777

RESUMEN

Objectives: To perform an international comparison of the trajectory of laboratory values among hospitalized patients with COVID-19 who develop severe disease and identify optimal timing of laboratory value collection to predict severity across hospitals and regions. Design: Retrospective cohort study. Setting: The Consortium for Clinical Characterization of COVID-19 by EHR (4CE), an international multi-site data-sharing collaborative of 342 hospitals in the US and in Europe. Participants: Patients hospitalized with COVID-19, admitted before or after PCR-confirmed result for SARS-CoV-2. Primary and secondary outcome measures: Patients were categorized as "ever-severe" or "never-severe" using the validated 4CE severity criteria. Eighteen laboratory tests associated with poor COVID-19-related outcomes were evaluated for predictive accuracy by area under the curve (AUC), compared between the severity categories. Subgroup analysis was performed to validate a subset of laboratory values as predictive of severity against a published algorithm. A subset of laboratory values (CRP, albumin, LDH, neutrophil count, D-dimer, and procalcitonin) was compared between North American and European sites for severity prediction. Results: Of 36,447 patients with COVID-19, 19,953 (43.7%) were categorized as ever-severe. Most patients (78.7%) were 50 years of age or older and male (60.5%). Longitudinal trajectories of CRP, albumin, LDH, neutrophil count, D-dimer, and procalcitonin showed association with disease severity. Significant differences of laboratory values at admission were found between the two groups. With the exception of D-dimer, predictive discrimination of laboratory values did not improve after admission. Sub-group analysis using age, D-dimer, CRP, and lymphocyte count as predictive of severity at admission showed similar discrimination to a published algorithm (AUC=0.88 and 0.91, respectively). Both models deteriorated in predictive accuracy as the disease progressed. On average, no difference in severity prediction was found between North American and European sites. Conclusions: Laboratory test values at admission can be used to predict severity in patients with COVID-19. Prediction models show consistency across international sites highlighting the potential generalizability of these models.

19.

A high-throughput phenotyping algorithm is portable from adult to pediatric populations.

Geva, Alon; Liu, Molei; Panickan, Vidul A; Avillach, Paul; Cai, Tianxi; Mandl, Kenneth D.

J Am Med Inform Assoc ; 28(6): 1265-1269, 2021 06 12.

Artículo en Inglés | MEDLINE | ID: mdl-33594412

RESUMEN

OBJECTIVE: Multimodal automated phenotyping (MAP) is a scalable, high-throughput phenotyping method, developed using electronic health record (EHR) data from an adult population. We tested transportability of MAP to a pediatric population. MATERIALS AND METHODS: Without additional feature engineering or supervised training, we applied MAP to a pediatric population enrolled in a biobank and evaluated performance against physician-reviewed medical records. We also compared performance of MAP at the pediatric institution and the original adult institution where MAP was developed, including for 6 phenotypes validated at both institutions against physician-reviewed medical records. RESULTS: MAP performed equally well in the pediatric setting (average AUC 0.98) as it did at the general adult hospital system (average AUC 0.96). MAP's performance in the pediatric sample was similar across the 6 specific phenotypes also validated against gold-standard labels in the adult biobank. CONCLUSIONS: MAP is highly transportable across diverse populations and has potential for wide-scale use.

Asunto(s)

Algoritmos , Registros Electrónicos de Salud , Humanos , Fenotipo

20.

Integrative High Dimensional Multiple Testing with Heterogeneity under Data Sharing Constraints.

Liu, Molei; Xia, Yin; Cho, Kelly; Cai, Tianxi.

J Mach Learn Res ; 222021 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-37426040

RESUMEN

Identifying informative predictors in a high dimensional regression model is a critical step for association analysis and predictive modeling. Signal detection in the high dimensional setting often fails due to the limited sample size. One approach to improving power is through meta-analyzing multiple studies which address the same scientific question. However, integrative analysis of high dimensional data from multiple studies is challenging in the presence of between-study heterogeneity. The challenge is even more pronounced with additional data sharing constraints under which only summary data can be shared across different sites. In this paper, we propose a novel data shielding integrative large-scale testing (DSILT) approach to signal detection allowing between-study heterogeneity and not requiring the sharing of individual level data. Assuming the underlying high dimensional regression models of the data differ across studies yet share similar support, the proposed method incorporates proper integrative estimation and debiasing procedures to construct test statistics for the overall effects of specific covariates. We also develop a multiple testing procedure to identify significant effects while controlling the false discovery rate (FDR) and false discovery proportion (FDP). Theoretical comparisons of the new testing procedure with the ideal individual-level meta-analysis (ILMA) approach and other distributed inference methods are investigated. Simulation studies demonstrate that the proposed testing procedure performs well in both controlling false discovery and attaining power. The new method is applied to a real example detecting interaction effects of the genetic variants for statins and obesity on the risk for type II diabetes.

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA